Skip to content

[fix](fe) Reject multi-column NGRAM_BF indexes#64343

Merged
airborne12 merged 2 commits into
apache:masterfrom
airborne12:doris-19296-ngbf-multicol
Jun 12, 2026
Merged

[fix](fe) Reject multi-column NGRAM_BF indexes#64343
airborne12 merged 2 commits into
apache:masterfrom
airborne12:doris-19296-ngbf-multicol

Conversation

@airborne12

@airborne12 airborne12 commented Jun 10, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Related PR: None

Problem Summary:

Creating an NGRAM_BF index with multiple columns passed FE validation and could reach BE tablet creation, where tablet metadata expects each NGRAM_BF index to bind exactly one column. This rejects invalid multi-column NGRAM_BF definitions during FE analysis for both inline table indexes and CREATE INDEX.

Release note

Reject invalid multi-column NGRAM_BF index definitions during DDL analysis.

Check List (For Author)

  • Test

    • Regression test
      • Added coverage in regression-test/suites/index_p0/test_ngram_bloomfilter_index.groovy for inline table index and CREATE INDEX paths. Not run locally because no worktree Doris cluster was started.
    • Unit Test
      • ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest
    • Manual test (add detailed scripts or steps below)
      • ./build.sh --fe
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. Invalid multi-column NGRAM_BF index definitions now fail during FE analysis.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: DORIS-19296

Problem Summary: Creating an NGRAM_BF index with multiple columns passed FE validation and could reach BE tablet creation, where tablet metadata expects each NGRAM_BF index to bind exactly one column. This rejects invalid DDL during FE analysis for both inline table indexes and CREATE INDEX.

### Release note

Reject invalid multi-column NGRAM_BF index definitions during DDL analysis.

### Check List (For Author)

- Test: Unit Test / Build

    - ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest

    - ./build.sh --fe

    - Added regression coverage under index_p0; not run locally because no worktree Doris cluster was started.

- Behavior changed: Yes. Invalid multi-column NGRAM_BF index definitions now fail in FE analysis.

- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29324 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dff748278a12a98bd4df3f0486b489e1a4a88671, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17643	4057	4230	4057
q2	q3	10778	1435	815	815
q4	4687	507	349	349
q5	7559	897	580	580
q6	185	175	136	136
q7	772	852	631	631
q8	9406	1540	1580	1540
q9	5845	4495	4415	4415
q10	6775	1813	1535	1535
q11	433	275	249	249
q12	633	426	291	291
q13	18206	3381	2721	2721
q14	257	256	247	247
q15	q16	825	767	708	708
q17	958	978	1010	978
q18	6906	5736	5575	5575
q19	1376	1287	1083	1083
q20	505	415	263	263
q21	6244	2801	2793	2793
q22	461	372	358	358
Total cold run time: 100454 ms
Total hot run time: 29324 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5136	4691	4857	4691
q2	q3	4843	5352	4692	4692
q4	2092	2178	1370	1370
q5	4769	4871	4603	4603
q6	232	177	123	123
q7	1909	1896	1598	1598
q8	2368	2070	2071	2070
q9	7882	7583	7370	7370
q10	4742	4700	4224	4224
q11	535	385	351	351
q12	721	740	526	526
q13	2994	3430	2764	2764
q14	277	281	249	249
q15	q16	678	696	605	605
q17	1275	1247	1249	1247
q18	7186	6884	6816	6816
q19	1124	1094	1122	1094
q20	2222	2219	1951	1951
q21	5258	4558	4378	4378
q22	524	448	424	424
Total cold run time: 56767 ms
Total hot run time: 51146 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169439 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dff748278a12a98bd4df3f0486b489e1a4a88671, data reload: false

query5	4306	638	479	479
query6	456	197	179	179
query7	4848	543	303	303
query8	364	211	204	204
query9	8776	4090	4056	4056
query10	443	304	262	262
query11	5936	2330	2205	2205
query12	155	107	106	106
query13	1252	614	419	419
query14	6440	5407	5095	5095
query14_1	4408	4429	4447	4429
query15	207	197	179	179
query16	1047	468	377	377
query17	1182	730	604	604
query18	2758	483	363	363
query19	225	187	151	151
query20	117	110	104	104
query21	231	140	121	121
query22	13687	13655	13468	13468
query23	17373	16627	16149	16149
query23_1	16272	16274	16306	16274
query24	7466	1758	1335	1335
query24_1	1342	1316	1316	1316
query25	587	480	402	402
query26	1319	321	169	169
query27	2565	554	340	340
query28	4384	2070	2047	2047
query29	1115	645	504	504
query30	313	236	201	201
query31	1125	1087	968	968
query32	106	64	64	64
query33	601	331	306	306
query34	1179	1154	643	643
query35	743	794	679	679
query36	1382	1344	1245	1245
query37	163	108	91	91
query38	3216	3183	3089	3089
query39	928	924	897	897
query39_1	879	882	877	877
query40	235	120	98	98
query41	65	62	61	61
query42	93	93	92	92
query43	313	313	280	280
query44	
query45	191	186	177	177
query46	1076	1187	768	768
query47	2312	2365	2210	2210
query48	407	408	296	296
query49	630	482	365	365
query50	999	379	261	261
query51	4373	4415	4276	4276
query52	89	89	77	77
query53	249	280	194	194
query54	271	215	194	194
query55	83	75	70	70
query56	249	219	223	219
query57	1436	1412	1332	1332
query58	249	224	213	213
query59	1584	1666	1466	1466
query60	295	247	221	221
query61	161	162	149	149
query62	706	651	582	582
query63	234	185	191	185
query64	2516	780	668	668
query65	
query66	1738	455	356	356
query67	29800	29823	29579	29579
query68	
query69	453	294	272	272
query70	997	939	940	939
query71	299	221	214	214
query72	3064	2727	2426	2426
query73	835	766	449	449
query74	5116	4970	4787	4787
query75	2668	2599	2242	2242
query76	2369	1164	752	752
query77	353	366	292	292
query78	12453	12613	11828	11828
query79	1375	1118	755	755
query80	600	478	408	408
query81	452	277	243	243
query82	586	162	123	123
query83	352	274	255	255
query84	
query85	884	540	444	444
query86	362	320	274	274
query87	3398	3342	3218	3218
query88	3636	2745	2716	2716
query89	418	381	342	342
query90	2035	184	183	183
query91	193	166	141	141
query92	64	61	58	58
query93	1479	1474	848	848
query94	536	372	299	299
query95	681	393	342	342
query96	1098	810	320	320
query97	2702	2736	2543	2543
query98	212	208	205	205
query99	1148	1201	1052	1052
Total cold run time: 251372 ms
Total hot run time: 169439 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/8) 🎉
Increment coverage report
Complete coverage report

@airborne12

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29625 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4911b16313c561570266364cbc651a76056b617c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17732	4319	4194	4194
q2	q3	10804	1451	844	844
q4	4683	482	345	345
q5	7521	903	582	582
q6	195	192	145	145
q7	817	886	644	644
q8	9812	1599	1736	1599
q9	6917	4547	4606	4547
q10	6808	1841	1530	1530
q11	441	282	254	254
q12	643	438	304	304
q13	18281	3495	2847	2847
q14	272	263	252	252
q15	q16	832	783	715	715
q17	1009	961	1034	961
q18	7084	5738	5772	5738
q19	1207	1208	1127	1127
q20	553	419	269	269
q21	5677	2650	2421	2421
q22	449	366	307	307
Total cold run time: 101737 ms
Total hot run time: 29625 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4495	4456	4430	4430
q2	q3	4596	4998	4363	4363
q4	2134	2256	1401	1401
q5	4536	4370	4358	4358
q6	235	175	133	133
q7	2389	1860	1643	1643
q8	2608	2207	2227	2207
q9	8070	8005	8073	8005
q10	4858	4812	4362	4362
q11	804	450	401	401
q12	761	765	544	544
q13	3297	3761	2955	2955
q14	308	331	296	296
q15	q16	748	743	642	642
q17	1389	1371	1400	1371
q18	8088	7409	6966	6966
q19	1134	1127	1137	1127
q20	2262	2233	1930	1930
q21	5331	4591	4494	4494
q22	511	469	405	405
Total cold run time: 58554 ms
Total hot run time: 52033 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169362 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4911b16313c561570266364cbc651a76056b617c, data reload: false

query5	4428	612	484	484
query6	451	193	176	176
query7	4828	596	303	303
query8	359	210	213	210
query9	8771	4133	4079	4079
query10	458	304	251	251
query11	5975	2384	2219	2219
query12	159	103	98	98
query13	1274	618	437	437
query14	6419	5391	5080	5080
query14_1	4350	4362	4338	4338
query15	204	197	173	173
query16	987	427	389	389
query17	920	696	558	558
query18	2451	459	334	334
query19	207	177	135	135
query20	108	106	106	106
query21	215	134	115	115
query22	13659	13598	13501	13501
query23	17443	16654	16249	16249
query23_1	16352	16325	16460	16325
query24	7869	1770	1309	1309
query24_1	1299	1311	1334	1311
query25	596	454	400	400
query26	1303	330	170	170
query27	2728	550	330	330
query28	4512	2041	2034	2034
query29	1105	641	518	518
query30	314	239	200	200
query31	1121	1095	967	967
query32	107	63	59	59
query33	531	322	266	266
query34	1169	1140	653	653
query35	749	784	714	714
query36	1392	1451	1231	1231
query37	152	107	93	93
query38	3208	3177	3072	3072
query39	945	920	905	905
query39_1	896	886	901	886
query40	224	126	104	104
query41	69	67	65	65
query42	98	103	94	94
query43	322	328	279	279
query44	
query45	197	187	183	183
query46	1095	1212	742	742
query47	2441	2404	2257	2257
query48	409	389	299	299
query49	646	484	357	357
query50	1015	352	255	255
query51	4334	4307	4228	4228
query52	89	90	80	80
query53	248	269	191	191
query54	285	241	219	219
query55	91	77	73	73
query56	259	232	226	226
query57	1419	1395	1314	1314
query58	246	221	224	221
query59	1624	1683	1416	1416
query60	310	260	241	241
query61	183	176	212	176
query62	702	643	595	595
query63	231	188	189	188
query64	2521	753	597	597
query65	
query66	1793	458	344	344
query67	29655	29723	29554	29554
query68	
query69	428	301	258	258
query70	990	923	957	923
query71	288	222	212	212
query72	2952	2631	2316	2316
query73	848	788	443	443
query74	5118	4973	4726	4726
query75	2652	2629	2237	2237
query76	2367	1158	761	761
query77	336	384	285	285
query78	12321	12483	11808	11808
query79	1428	1070	776	776
query80	1282	470	391	391
query81	524	279	237	237
query82	608	163	119	119
query83	324	275	247	247
query84	
query85	912	493	413	413
query86	454	288	287	287
query87	3407	3347	3161	3161
query88	3642	2764	2752	2752
query89	424	381	334	334
query90	1881	185	181	181
query91	183	161	134	134
query92	61	61	55	55
query93	1657	1425	904	904
query94	710	359	302	302
query95	662	378	439	378
query96	1021	840	320	320
query97	2742	2689	2558	2558
query98	212	213	207	207
query99	1138	1175	1058	1058
Total cold run time: 252285 ms
Total hot run time: 169362 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29258 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4911b16313c561570266364cbc651a76056b617c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17700	4239	4254	4239
q2	q3	10838	1430	804	804
q4	4681	471	341	341
q5	7582	933	599	599
q6	186	179	139	139
q7	794	831	640	640
q8	9335	1525	1710	1525
q9	5827	4498	4483	4483
q10	6620	1756	1531	1531
q11	431	280	252	252
q12	637	429	302	302
q13	18117	3557	2765	2765
q14	268	266	240	240
q15	q16	825	771	713	713
q17	987	999	1038	999
q18	6946	5837	5630	5630
q19	1315	1343	1079	1079
q20	527	418	261	261
q21	6024	2660	2402	2402
q22	438	370	314	314
Total cold run time: 100078 ms
Total hot run time: 29258 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4506	4426	4446	4426
q2	q3	4591	5011	4372	4372
q4	2270	2318	1492	1492
q5	4546	4422	4382	4382
q6	237	181	133	133
q7	1785	2191	1804	1804
q8	2819	2405	2400	2400
q9	8402	8248	8113	8113
q10	4861	4777	4344	4344
q11	599	427	412	412
q12	808	809	569	569
q13	3434	3674	2972	2972
q14	315	313	268	268
q15	q16	717	749	671	671
q17	1410	1359	1517	1359
q18	8122	7504	7281	7281
q19	1106	1106	1101	1101
q20	2235	2222	1951	1951
q21	5569	4854	4734	4734
q22	522	472	446	446
Total cold run time: 58854 ms
Total hot run time: 53230 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168815 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4911b16313c561570266364cbc651a76056b617c, data reload: false

query5	4307	630	479	479
query6	434	192	175	175
query7	4829	563	330	330
query8	367	219	209	209
query9	8753	3972	4002	3972
query10	442	308	242	242
query11	5853	2366	2192	2192
query12	154	99	94	94
query13	1244	621	421	421
query14	6374	5358	5103	5103
query14_1	4414	4412	4373	4373
query15	206	194	180	180
query16	1008	478	422	422
query17	1133	720	578	578
query18	2711	496	354	354
query19	214	181	147	147
query20	116	108	106	106
query21	220	150	128	128
query22	13693	13662	13395	13395
query23	17401	16506	16217	16217
query23_1	16292	16305	16280	16280
query24	7601	1785	1299	1299
query24_1	1319	1328	1336	1328
query25	602	468	394	394
query26	1319	318	173	173
query27	2651	557	334	334
query28	4421	2010	2007	2007
query29	1098	631	502	502
query30	313	241	193	193
query31	1125	1085	972	972
query32	110	67	60	60
query33	527	326	261	261
query34	1198	1149	725	725
query35	753	786	687	687
query36	1345	1349	1211	1211
query37	150	103	88	88
query38	3242	3153	3035	3035
query39	922	914	906	906
query39_1	902	873	872	872
query40	213	122	99	99
query41	64	61	61	61
query42	94	93	97	93
query43	327	328	291	291
query44	
query45	193	188	179	179
query46	1129	1227	738	738
query47	2333	2298	2230	2230
query48	406	397	295	295
query49	646	473	347	347
query50	1006	354	246	246
query51	4331	4388	4211	4211
query52	88	87	77	77
query53	238	272	183	183
query54	263	213	196	196
query55	79	78	71	71
query56	242	223	225	223
query57	1427	1396	1298	1298
query58	237	213	210	210
query59	1610	1726	1427	1427
query60	283	246	225	225
query61	156	153	151	151
query62	696	648	587	587
query63	234	190	190	190
query64	2478	773	625	625
query65	
query66	1721	452	387	387
query67	29900	29642	29566	29566
query68	
query69	440	305	262	262
query70	950	923	919	919
query71	293	217	210	210
query72	2917	2642	2274	2274
query73	850	770	460	460
query74	5110	4971	4823	4823
query75	2645	2572	2223	2223
query76	2341	1178	799	799
query77	376	368	298	298
query78	12364	12517	11987	11987
query79	1406	1062	751	751
query80	1277	496	398	398
query81	502	282	246	246
query82	582	154	118	118
query83	341	277	243	243
query84	
query85	889	501	415	415
query86	422	307	265	265
query87	3420	3382	3244	3244
query88	3613	2744	2726	2726
query89	436	384	333	333
query90	1988	180	178	178
query91	172	163	135	135
query92	64	63	58	58
query93	1501	1526	840	840
query94	741	363	305	305
query95	662	376	355	355
query96	1066	800	360	360
query97	2675	2725	2557	2557
query98	206	208	202	202
query99	1140	1174	1038	1038
Total cold run time: 251942 ms
Total hot run time: 168815 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/2) 🎉
Increment coverage report
Complete coverage report

@airborne12

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR rejects multi-column NGRAM_BF definitions in IndexDefinition.validate(), which is reached by inline table indexes, ALTER TABLE ADD INDEX, and standalone CREATE INDEX. The new unit and regression cases cover the invalid multi-column DDLs, and the build-index unit case covers the deferred path where columns are intentionally absent.
  • Scope: The code change is small and focused on the existing single-column validation branch, with no unrelated behavior changes.
  • Concurrency/lifecycle: No new concurrency is introduced. The non-intuitive lifecycle is the deferred BUILD INDEX constructor with cols == null; the added early return for NGRAM_BF preserves that path.
  • Config/compatibility/protocol: No new configuration, storage format, or FE-BE protocol fields are added.
  • Parallel paths: Nereids parser paths for inline indexes, alter add index, and create index all construct the same IndexDefinition; build index is handled separately by the deferred branch.
  • Conditional checks: The new condition aligns NGRAM_BF with existing single-column index types while exempting deferred build validation.
  • Tests/results: Added FE unit coverage and regression negative cases. No .out update is needed because the regression additions are exception-only.
  • Observability, transaction/persistence, performance: Not applicable for this validation-only change; no issue found.
  • User focus: No additional review focus was provided.

I attempted to run ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest, but the runner failed during generated-code setup before executing tests because thirdparty/installed/bin/protoc is missing.

@eldenmoon eldenmoon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@airborne12 airborne12 merged commit 059f56d into apache:master Jun 12, 2026
34 checks passed
@airborne12 airborne12 deleted the doris-19296-ngbf-multicol branch June 12, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x dev/4.1.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants